Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent checks from downloading whole repos #28

Merged
merged 7 commits into from
Feb 22, 2024

Conversation

david-baylibre
Copy link
Collaborator

Repo versions are computed from manifests with updated revisions for each project. The repo tool can only generate a manifest with accurate sha1 hashes once all projects have been downloaded locally (repo manifest -r -o new-manifest.xml). This uses a lot of network bandwith, memory, cpu and disk space and is a huge waste of time.
git ls-remote is used instead to fetch revisions much quicker before being injected into the original manifest. A new variable check_jobs is used to spawn concurrent processes to make it close to X times faster
Adjust jobs and check_jobs variables based on the git servers capabilities/limitations

@david-baylibre david-baylibre force-pushed the droze/optimize-check branch 2 times, most recently from 85e9492 to 06e9945 Compare January 15, 2024 15:04
@makohoek
Copy link
Owner

@david-baylibre please rebase this branch on main and fix the lint errors.

Then, I will have a look to review this.

@makohoek
Copy link
Owner

makohoek commented Feb 6, 2024

Copy link
Owner

@makohoek makohoek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @david-baylibre ,

Thank you for this PR. It's a really nice improvement.

I had some difficulties to review this in detail because it's a single big commit doing many things.

For the next version, could you please consider splitting this ?

Here is an example of how this could be split:

  1. requirements: bump up gitrepo version
  2. common: improve url matching regex for http/https
  3. common: add url,revision,name,depth when creating a Repo
  4. common: formatting changes (if you wish to have these)
  5. Implement getRevision using git ls-remote (and add unit tests for them)
  6. Migrate CHECK to use git-ls-remote

Does that make sense ?

README.md Outdated Show resolved Hide resolved
repo_resource/test_check.py Outdated Show resolved Hide resolved
repo_resource/check.py Outdated Show resolved Hide resolved
repo_resource/check.py Show resolved Hide resolved
repo_resource/common.py Outdated Show resolved Hide resolved
repo_resource/common.py Show resolved Hide resolved
repo_resource/common.py Show resolved Hide resolved
repo_resource/common.py Outdated Show resolved Hide resolved
repo_resource/requirements.txt Show resolved Hide resolved
repo_resource/test_in.py Outdated Show resolved Hide resolved
bump gitrepo from 2.31.1 to 2.32.2

Signed-off-by: David Rozé <[email protected]>
improve url matching regex with re module

Signed-off-by: David Rozé <[email protected]>
adding repo settings as class attributes makes it easy to use them
at any time in class methods, and not only in sync method

Signed-off-by: David Rozé <[email protected]>
fix line too long, and remove extra empty lines and unused module repo

Signed-off-by: David Rozé <[email protected]>
Setting revision to None isn't the same as not defining the value.
It makes the test more reliable

Signed-off-by: David Rozé <[email protected]>
@david-baylibre david-baylibre force-pushed the droze/optimize-check branch 2 times, most recently from 3ea8815 to e6fe98f Compare February 16, 2024 09:01
Copy link
Owner

@makohoek makohoek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi David,

Thank you for this new version.

I'm seeing some regression on URLS where we use ssh instead of https (with make tests)
https://gist.github.com/makohoek/c13a71ea7513403c42d64b4ac983fc3d

I would like to see those solved before merging.

Anyways, this is way better than the previous one.

I'm not 100% satisfied with the current design. My main complaints are

Both are no deal-breakers so I'm willing to merge this.
We can create github issues to follow up on those.

Repo versions are computed from manifests with updated revisions for each project.
The repo tool can only generate a manifest with accurate sha1 hashes once all projects
have been downloaded locally (repo manifest -r -o new-manifest.xml).
This uses a lot of network bandwith, memory, cpu and disk space and is a huge waste
of time.
git ls-remote is used instead to fetch revisions much quicker before being injected
into the original manifest. A new variable check_jobs is used to spawn concurrent
processes to make it close to X times faster
Adjust jobs and check_jobs variables based on the git servers capabilities/limitations

Signed-off-by: David Rozé <[email protected]>
Repo versions are computed from manifests with updated revisions for each project.
The repo tool can only generate a manifest with accurate sha1 hashes once all projects
have been downloaded locally (repo manifest -r -o new-manifest.xml).
This uses a lot of network bandwith, memory, cpu and disk space and is a huge waste
of time.
git ls-remote is used instead to fetch revisions much quicker before being injected
into the original manifest. A new variable check_jobs is used to spawn concurrent
processes to make it close to X times faster
Adjust jobs and check_jobs variables based on the git servers capabilities/limitations

Signed-off-by: David Rozé <[email protected]>
@makohoek
Copy link
Owner

@makohoek makohoek merged commit c23fc0c into makohoek:main Feb 22, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants